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Abstract 

Background: Spider silks are spectacular examples of phenotypic diversity arising from adaptive molecular 
evolution. An individual spider can produce an array of specialized silks, with the majority of constituent silk 
proteins encoded by members of the spidroin gene family. Spidroins are dominated by tandem repeats flanked by 
short, non-repetitive N- and C-terminal coding regions. The remarkable mechanical properties of spider silks have 
been largely attributed to the repeat sequences. However, the molecular evolutionary processes acting on spidroin 
terminal and repetitive regions remain unclear due to a paucity of complete gene sequences and sampling of genetic 
variation among individuals. To better understand spider silk evolution, we characterize a complete aciniform spidroin 
gene from an Argiope orb-weaving spider and survey aciniform gene fragments from congeneric individuals. 

Results: We present the complete aciniform spidroin {AcSpl) gene from the silver garden spider Argiope argentata 
{Aar_AcSpi), and document multiple AcSp 1 loci in individual genomes of A argentata and the congeneric A trifasciata 
and A aurantia. We find that Aar_AcSp1 repeats have >98% pairwise nucleotide identity. By comparing AcSpl repeat 
amino acid sequences between Argiope species and with other genera, we identify regions of conservation over vast 
amounts of evolutionary time. Through a PCR survey of individual A. argentata, A. trifasciata, and A. aurantia genomes, 
we ascertain that AcSpi repeats show limited variation between species whereas terminal regions are more divergent. 
We also find that average dN/dS across codons in the N-terminal, repetitive, and C-terminal encoding regions indicate 
purifying selection that is strongest in the N-terminal region. 

Conclusions: Using the complete A argentata AcSpl gene and spidroin genetic variation between individuals, this 
study clarifies some of the molecular evolutionary processes underlying the spectacular mechanical attributes of 
aciniform silk. It is likely that intragenic concerted evolution and functional constraints on A argentata AcSpi repeats 
result in extreme repeat homogeneity. The maintenance of multiple AcSpl encoding loci in Argiope genomes 
supports the hypothesis that Argiope spiders require rapid and efficient protein production to support their prolific use 
of aciniform silk for prey-wrapping and web-decorating. In addition, multiple gene copies may represent the early 
stages of spidroin diversification. 
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Background 

Spider silks are ideal for studying the molecular evolution- 
ary processes that create and maintain adaptive character- 
istics. An individual spider can produce and use different 
silk types singly or in combination for specific tasks, with 
each silk type having mechanical properties well-suited 
to its function. For example, aciniform silk is used in 
prey immobilization and egg sac construction [1,2]. The 
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mechanical properties of aciniform silk include impressive 
extensibility and toughness [1], making it excellent for 
swathing struggling prey because it is easy to stretch but 
difficult to break. Orb-weaving garden spiders from the 
genus Argiope are renowned for their use of aciniform silk. 
Argiope employ many layers of aciniform silk to com- 
pletely immobilize and envelop their prey (e.g. [3,4]), and 
Argiope are also a model system for studying the purpose 
of aciniform- silk web decorations, known as stabilimenta, 
that have been implicated in predator avoidance, prey 
attraction, and web stability (for review see [5,6]). 
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Of the five fibrous silks spun by the silver garden orb- 
weaver Argiope argentata, aciniform silk is the toughest 
and one of the most extensible [7]. However, little is 
known about the evolution of aciniform silk's physical 
attributes. Spider silk mechanical properties are related 
to the suite of proteins that compose each silk type. The 
majority of spider silk proteins, or spidroins (a contraction 
of "spider-fibroins" [8]), are encoded by members of a 
single gene family. Spidroins are typically very large 
(>200 kDa), and are dominated by a series of iterated 
repeats flanked by short amino (N)- and carboxy (C)- 
terminal regions [9,10]. The length, number, and amino 
acid (aa) composition of the iterated repeats are silk- 
type specific, whereas phylogenetic analyses have shown 
that aa residues in the N- and C-terminal regions are 
more conserved across spidroins [11,12]. Repeat aa se- 
quence corresponds to secondary structures that are 
partly responsible for silk mechanical properties (e.g. 
[13-15]), and conservation of the N- and C-terminal re- 
gions [12] and their presence in spun silk fibers sug- 
gests an important role in spider silk biology [16-18]. 

The evolutionary maintenance of spidroin repeat se- 
quences within a silk type and the divergence of those 
repeat sequences between silk types is central to spider 
silk function and diversity. Within a particular spidroin, 
repeat units tend to be highly similar, or homogeneous, in 
amino acid and nucleotide sequence. The gene encoding 
aciniform spidroin (AcSpl) has repeats that are relatively 
complex among spidroin family members, however, des- 
pite this complexity, AcSpl repeats are also spectacularly 
homogenized [1,19]. A recent analysis of a complete 
AcSpl from the western black widow Latrodectus 
hesperus showed that its repetitive region, like those 
of other spidroins, is dominated by the amino acids 
glycine (G), alanine (A), and serine (S) [19]. However, 
L. hesperus AcSpl repeats have few or none of the short 
glycine and alanine-rich subunits, such as GGX, poly-GA, 
and poly-A, that can be the bulk of other spidroin repeats 
[9]. Nevertheless, L. hesperus AcSpl repeats are remark- 
ably homogenized (>99% identity at the nucleotide level 
[19]). This is consistent with results from a partial length 
AcSpl cDNA from the banded garden orb-weaver Argiope 
trifasciata, which has 14 repeats that are each 600 bp and 
share 99.9% identity at the nucleotide level [1]. 

The high level of AcSpl repeat homogeneity is frequently 
attributed to gene conversion or unequal crossing over 
resulting in intragenic concerted evolution (e.g. [1,19,20]). 
Concerted evolution usually refers to homogenization 
among gene family members, such as rDNA gene cop- 
ies [21], but it can also occur within a gene [22,23]. Sta- 
bilizing selection alone would maintain protein sequence, 
resulting in a high level of repeat identity at the aa level. 
However, the extreme level of homogenization reported 
for AcSpl repeats provides evidence for concerted 



evolution because it exists at both the protein and nucleo- 
tide levels [1,19]. 

In addition to concerted evolution, repeat homogeneity 
in AcSpl may be maintained by functional constraints. 
Recent nuclear magnetic resonance (NMR) studies of 
AcSpl repeats from both Nephila antipodiana and A. 
trifasciata delineate different domains in each repeat 
unit, one domain that is rich in alpha helices and one 
that is not [24,25]. Xu et al. [25] used NMR and dihedral 
angles from global likelihood estimate (DANGLE) ana- 
lyses to predict the chemical shift indices of a 199 aa re- 
combinant A. trifasciata AcSpl repeat. The consensus 
secondary structure assignments specified that the last 
quarter of the protein was unstructured, but that the first 
three-quarters of the repeat contained six major helical 
regions. Protein structures such as these six alpha heli- 
ces are considered the foundation for silk mechanical 
properties (e.g. [25,26]). 

Assessing the extent to which a spidroin is homoge- 
nized within a single gene or among individuals is diffi- 
cult because the repetitive region makes it exceptionally 
challenging to sequence complete spidroin genes. In- 
deed, partial length sequences that are biased toward the 
C-terminus greatly dominate the number of published 
spidroins [12]. Additionally, the evolutionary processes 
leading to spidroin divergence between species and silk 
types are often unclear due to a lack of knowledge about 
spidroin genetic variation among individuals. 

Here, we address these issues by presenting a complete 
spidroin gene from an Argiope spider, the AcSpl sequence 
of A. argentata (Aar_AcSpl), and by screening for AcSpl 
variation among individual A. argentata, A. trifasciata, 
and A. aumntia spider genomes. Sequencing the full array 
of Aar_AcSpl repeats enabled us to test hypotheses of 
concerted evolution and functional constraints. Based on 
previous spidroin research, Aar_AcSpl repeats should be 
extremely homogenous at the nucleotide and amino acid 
levels. In addition, amino acid sequences that are pre- 
dicted to correspond to the structural motifs that contrib- 
ute to the toughness and extensibility of aciniform silk 
should be more conserved between Argiope species rela- 
tive to surrounding regions. Among the surveyed A. 
argentata individuals, we expected Aar_AcSpl to be a 
single-copy gene, similar to L. hesperus AcSpl [19]. Be- 
tween species, previous research suggests that the spidroin 
repeats within each silk type are highly conserved, but that 
the terminal regions show more variation [12], and we hy- 
pothesized that Aar_AcSpl would also follow this pattern. 

Results and discussion 

Argiope argentata AcSpl complete sequence and 
phylogenetic placement 

Despite obtaining 59 AcSpl cDNA clones, including one 
that was >8 kb [1], a complete Argiope AcSpl remained 
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elusive until the present study. By screening a large-insert 
genomic DNA library, we sequenced and assembled 
18,080 bp of A. argentata DNA including a complete 
open reading frame that is 13,440 bp long and predicted 
to encode a 4,479 aa A. argentata AcSpl (^4«r_AcSpl; 
Figure 1; GenBank KJ206620). No introns were detected. 
The putative protein has a predicted size of -430 kDa, 
and the most abundant amino acids are serine (22.6%), 
alanine (14.4%) and glycine (13.3%). A«r_AcSpl has three 
regions, a central repetitive region that is flanked by con- 
served N- and C-terminal regions. The repetitive region 
dominates -90% of the protein and is composed of 20 
iterated repeats (Figure 1). The first 19 repeats are each 
204 aa, and the last repeat is 186 aa due to truncation 
at the end (Figure 1). The length, amino acid composition, 
and organization of AcSpl are all consistent with other 
spidroin family members [9]. 

Phylogenetic analyses of Aar_AcSpl N- and C-terminal 
coding regions with those from other spidroins grouped 
Aar_AcSpl with the Latrodectus (widow spider) AcSpl se- 
quences in a well-supported clade (bootstrap value = 98%; 
Figure 2). Latrodectus and Argiope are estimated to have 
diverged from each other -175 million years ago (MYA) 
[27]. Despite this lengthy time period, the recovery of an 
AcSpl clade was consistent with prior studies in which 
spidroin sequences nearly always grouped based on silk 
type (e.g. [9,10]). The sister group to the AcSpl clade was 
TuSpl, tubuliform (egg-case) spidroin, suggesting that 
these paralogs have a relatively recent common ancestor 
[19]. Further potential evidence of their shared ancestry is 
that both of these silk types are used in egg-case construc- 
tion and both have repeats that are relatively complex 
compared to other spidroins [28]. In our phylogenetic 
analysis, a large, weakly supported assemblage of spidroins 
is sister to the combined AcSpl and TuSpl clade. Given 
the low support, it is unclear which spidroins are most 
closely related to AcSpl and TuSpl. 

Argiope argentata AcSpl repeat homogeneity 

As expected, Aar AcSpl repeats are complex and spec- 
tacularly homogenized. Although glycine, alanine, and 
serine account for -50% of its repetitive region com- 
position, A«r_AcSpl has few of the glycine/alanine-rich 
motifs such as GGX, GPG, poly-GA, and poly-A that 
are dominant in the dragline major ampullate spidroins 
(MaSpl, MaSp2) from Argiope and other taxa [9]. At 
the nucleotide level, the average pairwise percent identity 
between Aar_AcSpl repeats is an astonishing 98.7%. 
Complexity and extreme homogenization are also features 
of previously described AcSpl sequences [1,19]. 

The extreme nucleotide identity of Aar_AcSpl is con- 
sistent with concerted evolution, and cannot be easily 
explained by codon usage bias. For example, Aar_AcSpl 
codon use is strongly influenced by amino acid position 



within a repeat. In our repeat alignment (Additional 
file 1: Figure SI), the neighboring alanine codons at 
nucleotide positions 103-105 and 106-108 are GCC and 
GCT, respectively. GCCGCT is present in the same rela- 
tive location in all twenty repeats. Similarly, the glycine 
codons that appear at nucleotide positions 64-66 and 
130-132 also consistently use different codons (GGT and 
GGA, respectively). The same alternative codons are used 
at the same exact positions throughout most, if not all, the 
repeats. Despite a slight skew toward alanine codons that 
end in adenine (A) or thymine (T) (55.0% GCW, W being 
the IUPAC ambiguity code for A or T; Additional file 2: 
Table S5), it is difficult to postulate that selective forces 
acting at the level of codon usage are responsible for the 
extensive homogeneity of codon positions found through- 
out the 612 bp Aar_AcSpl repeat. Concerted evolution 
that fixes particular codons at particular locations across 
repeats provides a clearer explanation. 

Analyses of the full array of Aar AcSpl iterated re- 
peats were also consistent with two concerted evolution 
predictions. First, ML analysis grouped araneid AcSpl 
repeats into well-supported, species-specific clades rather 
than grouping the repeats across species (Figure 3A). 
Furthermore, nucleotide pairwise identity within each 
species averaged 98%, but pairwise identity between 
Aar_AcSpl repeats and repeats from other species aver- 
aged only 78.5% (73.6% vs. Araneus ventricosus, 79.1% 
vs. A. trifasciata, and 82.8% vs. A. amoena). That repeats 
are more similar within species than between species re- 
gardless of intragenic repeat position can be explained by 
rapid intra-specific spread of genetic variation via unequal 
crossing over during recombination [22,23]. 

Second, the average nucleotide pairwise identity of the 
first and last Aar_AcSpl repeats to the rest of the array 
is slightly lower at 96% and 93%, respectively. Less simi- 
lar first and last repeats are consistent with some models 
of concerted evolution [29]. However, araneid AcSpl 
first and last repeats still grouped within species-specific 
clades (Figure 3A), suggesting that these repeat sequences 
are more homogeneous within a gene than those of pre- 
viously analyzed spidroins. For example, in an analysis 
of repetitive units from the flagelliform spidroin {Flag) 
of the golden orb-weaver Nephila clavipes and the con- 
generic Nephila inaurata madagascariensis, the first re- 
peats grouped together across species, and the last repeats 
also formed their own clade. By contrast, the central (not 
first or last) repeats formed species-specific clades because 
each repetitive unit was nearly identical within each spe- 
cies yet divergent across species [30]. Longer estimated 
divergence times between the species in our present study 
may explain the more thorough homogenization of ara- 
neid AcSpl sequences compared to that of the previously 
studied Nephila Flag sequences. The estimated divergence 
time between the Nephila species is -7.4 MYA [31], 
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Figure 1 Schematic of the protein encoded by the complete Argiope argentata aciniform spidroin 1 gene (GenBank KJ206620). 

Predicted protein is 4,479 aa. Conserved spidroin N- (orange) and C- terminal (blue) domains (shaded boxes) flank 20 iterated repeats (numbered 
boxes). Boxes are drawn to scale and standardized to the 204 aa length of the first 19 repeats. Arrows point to corresponding amino acid sequences 
for each domain. Exemplar repeat 1 1 has 100% identity to the majority rule consensus of the repeat sequences. Alanine (red), serine (purple), 
and glycine (green) are shaded to emphasize the abundance of those amino acids. 



whereas Araneus and Argiope are thought to have di- 
verged -30 MYA and within Argiope, -23 MYA between 
A. argentata and A trifasciata [32]. 

Functional constraints on AcSpl repeats 

We predicted that functional constraints would result in 
greater aa sequence conservation in the portion of each 
repeat proposed by Xu et al. to contain six alpha helices 
[25]. To test this, we first compared known Araneidae 
repetitive regions. We aligned consensus AcSpl repeat 
sequences from three Argiope species (A argentata, A 
trifasciata, A amoena) and Araneus ventricosus. We then 
graphed pairwise identities for each aligned position be- 
tween the A trifasciata repeat sequence and the other 
species, and plotted it against the predicted A trifasciata 
domains from Xu et al. [25] (Figure 3B). Using amino acid 
positions from Xu et al. [25], the average percent pairwise 
identity over the 150 aa helix-rich domain was 84.0%, but 
only 54.1% over the remaining 49 aa. Xu et al. [25] also 
noted a major alpha-helical domain from 102-151 aa, 
encompassing the region denoted as helix 5 and 6 in 
Figure 3B. Consistent with being structurally important, 
the average percent identity in this domain was 90.7%. 
Moreover, our alignment was slightly longer (216 aa) 
than the A trifasciata recombinant repeat length (199 aa) 
due to indels that only appeared in the unstructured 
region. Notably, in the region from 200-209 aa (our align- 
ment), the A trifasciata repeat has a deletion (Figure 3B). 
These indels further indicate that the final quarter of AcSpl 
repeats is less conserved than the first three-quarters. 

To investigate amino acid conservation in the pre- 
dicted AcSpl helical regions across greater evolutionary 
time, we also aligned consensus amino acid AcSpl 



repeat sequences from L. hesperus and Uloborus diversus 
to the A trifasciata repeat from Xu et al. [19,25]. Ara- 
neidae, represented here by Argiope and Araneus, and 
Theridiidae, represented by L. hesperus, are members of 
the superfamily Araneoidea, with araneids and theridiids 
estimated to have last shared a common ancestor -175 
MYA [27]. U. diversus is within the Deinopoidea, the 
sister-group to the Araneoidea. Araneoids and deinopoids 
diverged from each other -210 MYA [27]. Together, 
Araneoidea and Deinopoidea compose the Orbiculariae 
(orb-web weaving spiders and their relatives). 

The AcSpl repeat units from L. hesperus and U. diversus 
are almost twice as long as the araneid AcSpl repeat units. 
The L. hesperus and U. diversus repeat units can be further 
subdivided into two parts that align with each other [19]. 
We aligned each part from each species (two parts per 
species) to the A. trifasciata repeat separately. We then 
calculated the average pairwise percent identity for each 
comparison and for each of the six putative alpha-helical 
regions predicted by Xu et al. ([25]; Additional file 2: 
Table S6). The overall pairwise identities between L. 
hesperus repeat part 1 and U. diversus repeat part 1 
with the A trifasciata repeat was 30% and 29%, respect- 
ively. Of note, the percent pairwise identity between 
L. hesperus repeat part 1 and the A trifasciata repeat 
was 47% in the A trifasciata region associated with 
helix 4, and it was 41% against U. diversus repeat part 
1 in the region associated with helix 6. 47% and 41% 
were the highest pairwise identity percentages. 

Our results strongly support the hypothesis that func- 
tional constraints are acting to conserve protein sequence 
in the repetitive region of AcSpl. Our comparison of 
A trifasciata AcSpl repeat sequence with that of other 
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Figure 2 Maximum likelihood tree of concatenated N- and C-terminal coding regions from 29 published spidroins and Aar_AcSp1 from 
this study (accession numbers in Additional file 2: Table S2). Box highlights aciniform clade, with Argiope argentata AcSpl further indicated in 
red. Vertical bars identify clades by silk type. Bootstrap values greater than 70% are shown. Abbreviations defined in Additional file 2: Table S2. 
Scale bar represents substitutions per site. 



araneid species indicates that a specific amino acid se- 
quence is maintained in the predicted helix-rich do- 
main of AcSpl repeats across Araneidae. In contrast, 
comparison between the A.trifasciata repeat with part 
1 of repeats from L. hesperus and U. diversus indicates 
that amino acid sequence in the regions associated with 
alpha-helices 4 and 6 are the most highly conserved across 
Orbiculariae. The higher level of conservation in the 
amino acid sequences corresponding to helices 4 and 6 
may indicate that these regions impart the same general 
function across Orbiculariae whereas the other predicted 
helical regions of A. trifasciata impart functions unique 
to Araneidae. Sequencing AcSpl from other genera of 
Araneidae and other families of Orbiculariae will enable 
further elucidation of these hypotheses. 

To our knowledge, there are no current predictions 
about the secondary structures of L. hesperus or U. diversus 
AcSpl repeats. It is feasible that, like the AcSpl domains 
of N. antipodiana and A. trifasciata, L. hesperus and 
U. diversus AcSpl repeats also feature distinct structural 
regions. Finally, our analysis may be an underestimation 



of sequence conservation because it does not include 
amino acid replacements that are functionally equivalent. 
However, predicting functional protein similarity is dif- 
ficult given the extensive physicochemical changes that 
spider silk undergoes as it is processed from a liquid 
into dry silk (e.g. [33,34]). 

Delineation of AcSpl variants in individual Argiope spiders 

Spidroin sequence variation between individual spiders 
is an important source of genetic variation for the evolu- 
tion of different silk types within and between species. 
To investigate genetic variation in AcSpl between indi- 
viduals of A. argentata and the congeneric A. trifasciata 
and A. aurantia, we first designed PCR primers targeting 
the repetitive region of Aar_AcSpl. Amplification of gen- 
omic DNA across species and individuals resulted in AcSpl 
repeat sequences that did not show intraspecific variation 
but had significant inter-specific variation (Figure 4A). 
Intraspecific homogenization of the repeats could be ex- 
plained by biased PCR amplification of a single repeat type 
in the repetitive region, however, our results are consistent 
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Figure 3 Concerted evolution and selection on the repetitive region of AcSpl. (A) Iterated AcSpl repeats show intra-specific homogenization 
in the family Araneidae. Midpoint-rooted maximum likelihood tree of AcSpl DNA repeats (R) from A. amoena (Aam; purple), A. argentata 
(Aar; orange), A. trifasciata (At; green), and Araneus ventricosus (Av; black). Repeats are numbered from 5' to 3'. Bootstrap values for species-specific 
groups are shown. Scale bar indicates substitutions per site. (B) Functional constraint on repeat sequence. Graph of the pairwise identities of consensus 
AcSpl repeat sequences from two Argiope species (A. argentata and A. amoena) and Araneus ventricosus to that of Argiope trifasciata. Bars show 100% 
(green), 66% (yellow), or 33% (red) identity at each position, helical domains found by NMR and DANGLE analyses of A trifasciata AcSpl repeat 
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with the high degree of repeat sequence conservation Next, we designed PCR primers targeting N- and C- 
in AcSpl sequences from araneids (Figure 3A) and L. terminal coding regions of Aar_AcSpl. We then amplified 
hesperus [19]. the same individual genomic DNAs that were surveyed 
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Figure 4 Multiple AcSpl loci in Argiope. Maximum likelihood nucleotide trees of PCR amplified sequences from the (A) repetitive, (B) N-terminal, 
and (C) C-terminal coding regions of AcSpl from three Argiope species, A, argentata (Aarg; orange), A. aurantia (Aau; blue), and A. trifasciata (At; green). 
Araneus ventricosus (Av; black) sequence (GenBank HQ008714) was used to root repeat and C-terminal trees. N-terminal tree is midpoint-rooted (B). 
For each variant, the adjacent table row indicates the status of that variant in individual spiders. Each individual per species was assigned a number 
that appears in the corresponding row if a variant was detected. Bootstrap values of 100% are shown. * denotes outgroup or published A. trifasciata 
sequence (GenBank AY426339), scale bar represents substitutions per site. Accession numbers for sequence generated in this study are given in Additional 
file 2: Table S4. 
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for the repetitive region (Figure 4). Unlike the repetitive 
region PCR, direct sequencing of terminal region PCR 
products resulted in extensive numbers of multiple peaks 
and in some cases, poor sequencing reads due to length 
differences. Thus, all terminal region PCR products were 
cloned, a total of 385 amplicons were sequenced, and 
AcSpl variants were diagnosed. Each variant was sup- 
ported by at least two amplicons with sequences that had 
greater than 95% identical bases (Figure 4; see Methods). 
Unlike the repetitive region sequences, which showed no 
intra-specific or allelic variation, all terminal region ampli- 
fications were heterogeneous. 

The number of variants characterized was surprising 
because all of the individual spiders surveyed were found 
to have more than two terminal region variants, indicat- 
ing that these Argiope species must have multiple AcSpl 
encoding loci. Argiope spiders are not known to be poly- 
ploid, thus multiple gene copies per genome is the only 
explanation for more than two N- or C- terminal region 
variants in a single individual. For example, we found 
seven C-terminal variants in one A. argentata, suggesting 
at least four AcSpl gene copies (Figure 4C). Each A. 
trifasciata individual possessed a minimum of seven N- 
or C-terminal variants, again indicating at least four 
gene copies (Figure 4B, C). Likewise, an A. aurantia in- 
dividual possessed six N-terminal variants but only two 
C-terminal variants (Figure 4B, C). The smaller number 
of C-terminal variants could be explained by lack of 
variation in the C-terminal region or by incomplete 
sampling of variants by PCR survey. 

ML analysis of sequences from the PCR survey shows 
that the branch lengths in the repetitive region (Figure 4A) 
are shorter than the branch lengths of the terminal region 
trees (Figure 4B, C). The majority of N- and C-terminal 
variants cluster into well-supported, species-specific 
groups, and intra-specific branch lengths are very short 
compared to inter-specific branch lengths (Figure 4B, C). 
One exception is A. argentata C-terminal coding region 
variant VI, which forms a weakly supported group with 
A. trifasciata and A. aurantia C-terminal coding region 
variants (Figure 4C). Given the weak clade support, this 
variant is probably an outlier that is not as homogenized 
as the other A. argentata variants. 

The shorter branch lengths of the repetitive region vari- 
ants tree compared to those of the N- and C-terminal 
region trees suggest that the repetitive region is the 
most conserved araneid AcSpl region (Figure 4). Yet, 
comparison of the average ratio of non-synonymous to 
synonymous substitution rates (dN/dS) across codons 
implies that the N-terminal region has been subject to 
slightly stronger purifying selection than the repetitive 
and C-terminal regions (0.20 vs. 0.30 and 0.42 dN/dS, 
respectively). dN/dS estimations, however, assume in- 
dependence of sites and thus are confounded by factors 



such as concerted evolution and recombination. The 
full-length Aar_AcSpl and other AcSpl provide extensive 
evidence that the repetitive region units are most likely 
not evolving independendy from each other (Figure 3A; 
[1,19]). Thus concerted evolution and purifying selection 
both must play a role in the near-perfect homogeneity of 
Argiope AcSpl iterated repeats. Recombination can also 
affect tests of selection [35]. Because we could not 
conclusively determine the exact number of loci within 
an individual or assign alleles to specific loci, we were 
unable to ascertain recombination between loci. Subse- 
quent analyses with additional data could address the 
impact of recombination on dN/dS estimates. 

Previous work with AcSpl sequences did not find evi- 
dence for multiple loci [1,19]. The lack of variation among 
A. trifasciata AcSpl cDNA clones [1] may be due to over- 
expression or preferential cloning of one variant and thus 
its preponderance in the characterized cDNAs. Alterna- 
tively, consistent depletion of aciniform silk may be re- 
quired to stimulate transcription of multiple AcSpl loci. 
This hypothesis is supported by a significant increase in 
aciniform- silk dependent web-decorating behavior in three 
species of Argiope in response to a two-week period of 
aciniform silk depletion [4]. Future work could focus 
on comparing the number of AcSpl variants expressed 
by spiders consistently depleted of aciniform silk versus 
that from spiders that are not depleted. 

Survey of individual L. hesperus genomes also did not 
find AcSpl variants [19]. However, the detection of mul- 
tiple AcSpl loci in Argiope but not Latrodectus is con- 
sistent with the hypothesis that Argiope spiders maintain 
multiple gene copies as a strategy for efficiently produ- 
cing large amounts of protein. In contrast with Argiope 
spiders, Latrodectus spiders use markedly fewer strands 
of aciniform silk during prey capture [3] and do not make 
stabilimenta. Increased AcSpl copy number in Argiope 
spiders may therefore be a strategy for increasing protein 
production [36]. Because spidroins are costly, highly 
expressed proteins [37,38], resource abundance in the 
form of prey availability may also stimulate aciniform 
spidroin production in Argiope to prepare for resource 
scarcity [39,40]. Precedent for this strategy exists. In the 
bacteria Escherichia coli, multiple copies of rRNA operons 
provide a competitive advantage by enabling increased 
growth rates and decreased cell division lag time in envi- 
ronments where resources fluctuate rapidly [41,42]. 

Previous research has found variants for other spi- 
droins [43-46], and that the dragline spidroin MaSpl is 
encoded by multiple loci in several species [47,48]. Un- 
like A. argentata AcSpl variants, the C-terminal coding 
region of L. hesperus MaSpl is nearly identical across loci 
[47]. This difference could indicate functional constraints 
on the C-terminal coding region of MaSpl that either dif- 
fer from or are not acting on AcSpl. A comparison of 
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structural component predictions from the amino acid 
sequences of terminal regions across different spidroins 
would greatly inform our understanding of the contri- 
bution of the terminal regions to the evolution of dif- 
ferent spider silk types. 

Conclusions 

The highly similar iterated repeat array of our complete 
Argiope argentata AcSpl gene combined with sequence 
conservation of functionally important regions of indi- 
vidual repeats supports a hypothesis of concerted evo- 
lution and functional constraints acting together to 
homogenize Aar_AcSpl repeats. In addition, several 
terminal region variants per individual Argiope genome 
indicate multiple Argiope AcSpl loci. Across AcSpl loci 
within an individual, we found homogenization of the 
repetitive region, but variation at the terminal coding 
regions. We also found evidence for stronger purifying 
selection in the N-terminal region versus the repetitive 
or C-terminal region, suggesting that the N-terminal 
region is the most constrained portion of the aciniform 
spidroin. The maintenance of multiple copies of AcSpl 
in Argiope genomes underscores the importance of aci- 
niform silk in Argiope ecology and evolution. Indeed, 
variation between individuals and multiple gene copies 
within individuals could provide a method for the rapid 
synthesis of aciniform silk in this genus, and may repre- 
sent the early stages of the differentiation that led to 
the extraordinary sequence and functional diversity of 
spider silks. 

Methods 

Isolation and sequencing of AcSpl containing BAC clone 

A bacterial artificial chromosome (BAC) library was con- 
structed by Rx Biosciences (Rockville, MD) with Argiope 
argentata genomic DNA inserted into pCClBAC vector 
(Epicentre, Madison, WI). Colony pools were PCR screened 
for AcSpl with primers designed from the repetitive region 
of Argiope trifasciata AcSpl (Additional file 2: Table SI), 
resulting in one positive clone. The positive clone was 
restriction enzyme digested and a -17 kb Hind III frag- 
ment of the full insert was found to contain the 
complete AcSpl gene. 

The 17 kb fragment was gel purified with the S.N.A.P. 
UV-Free Gel Purification Kit (Invitrogen, Carlsbad, CA), li- 
gated into Hindlll digested pZErO™-2 plasmid (Invitrogen), 
and transformed into TOP10 cells (Invitrogen). Seven 
plasmid clones with the expected insert size and restric- 
tion enzyme digest patterns were end-sequenced with 
M13 and Sp6 primers to identify orientation of the inserts. 
Two clones (one of each insert orientation) were triple- 
digested with Spel, Xbal, and Xhol and the fragments 
were gel-purified. The two clones were also single- 
digested with BamHI and the largest fragment from 



each digest (5.5 kb or 5.9 kb, composed of the vector 
and either a 2.2 kb or 2.6 kb insert fragment) was gel- 
purified and re-circularized to produce subclones. The 
triple digest produced a 12.4 kb Spel/Xbal fragment 
that was gel-purified and subcloned into Spel digested 
pZErO-2 plasmid. End-sequencing the subclones revealed 
that the 2.6, 12.4, and 2.2 kb inserts corresponded to the 
AcSpl N-terminal, repetitive, and C-terminal encod- 
ing regions, respectively. The 2.6 and 2.2 kb fragments 
were sequenced in their entirety using primer walking 
(Additional file 2: Table SI). Because the 12.4 kb frag- 
ment contained repetitive nucleotide sequence, primer 
walking was not feasible. Instead, the 12.4 kb fragment 
was bidirectionally sequenced in its entirety using 
the transposon-based EZ-Tn5 < TET-1 > Insertion Kit 
(Epicentre Biotechnologies). The complete contig of 
the 17 kb genomic fragment was manually assembled 
with Sequencher 4.5 (Gene Codes, Ann Arbor, MI). An 
additional 1 kb of genomic sequence immediately adjacent 
to the 3' end of the 17 kb fragment was determined by 
primer walking (Additional file 2: Table SI) using the 
original BAC clone as template DNA. The complete 
Argiope argentata AcSpl gene was uploaded to GenBank 
with the accession number KJ206620. 

Inter- and Intraspecific sampling of N-, repetitive, 
and C-terminal coding region fragments 

Genomic DNA was extracted from single legs removed 
from four A. argentata, one A. aurantia, and two A. 
trifasciata individuals using the DNeasy Blood & Tissue 
Kit (Qiagen, Valencia, CA). N-terminal, repetitive, and 
C-terminal encoding fragments of AcSpl were PCR ampli- 
fied using primers designed from the A. argentata AcSpl 
complete gene (Additional file 2: Table SI). 

PCR products of the expected size were purified using 
the AccuPrep Gel Purification Kit (Bioneer Inc., Alameda, 
CA). Products were directly sequenced. If a chromato- 
graph had overlapping peaks, indicative of heterogeneous 
amplification, then the product was ligated into pJET 1.2 
plasmid (ThermoScientific) and transformed into TOP10 
cells. Individual colonies were PCR amplified using 
p JET 1.2 Forward and Reverse sequencing primers. Inserts 
of the expected size were gel purified and sequenced. If 
one variant was highly abundant, then additional colonies 
were PCR amplified and digested with restriction enzymes 
to identify the abundant variants. The remaining un- 
digested PCR products containing the rare variant were 
purified and sequenced. 

Diagnosing variants 

Nucleotide sequences from the PCR fragments from 
each species were aligned as described below. For vari- 
ant diagnosis, single nucleotide polymorphisms (SNPs) 
that were present in only one individual clone were 
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attributed to Taq polymerase error and that SNP was ig- 
nored. If a sequence had a pattern of polymorphism that 
was not present in at least one other clone, the sequence 
was discarded. Neighbor joining trees were then used to 
visualize highly similar sequences. Clusters that had 
greater than 95% identical sites were considered a vari- 
ant group. With the exception of the cluster for A. tri- 
fasciata C-terminal coding variant 14 (95.2% identical 
sites), all clusters had greater than 98% identical bases. 
Clustered sequences were extracted and aligned to de- 
rive the majority rule consensus for that variant. Each 
variant is therefore supported by at least two sequences. 
Variants were uploaded to GenBank with accession num- 
bers KJ206570-KJ206619. 

Phylogenetic analyses 

The conserved spidroin N- and C-terminal regions from 
the complete A. argentata AcSpl were aligned to 29 pub- 
lished spidroins that also have both N- and C-terminal 
regions (accession numbers in Additional file 2: Table S2) 
using ClustalW [49] implemented in Geneious v6.1.6 
(Biomatters Ltd., Auckland, NZ). The N- and C-terminal 
regions were separately aligned with default settings and 
the alignments were adjusted by eye. The aa alignments 
dictated nucleotide alignments. N- and C-terminal encod- 
ing region alignments were concatenated for phylogenetic 
analyses of spidroin paralogs (Additional file 1: Figure S2). 
Despite potential recombination and convergence in the 
N- and C-terminal encoding regions, previous research 
found no conflict between the strongly supported nodes 
between separate N- and C-terminal trees and that con- 
catenation of the terminal regions provides greater evo- 
lutionary resolution [12]. 

The 20 repeat units from the complete A. argentata 
AcSpl repetitive region were divided into individual files 
and aligned as above with individual repeat units from 
published araneid AcSpl sequences (Additional file 1: 
Figure S3, accession numbers in Additional file 2: 
Table S3). Alignments for the N- and C-terminal en- 
coding sequences obtained from the PCR survey of in- 
dividual genomes were created as above using diagnosed 
variants. Repetitive region alignments from the PCR sur- 
vey also used the above method (alignments in Additional 
file 1: Figures S4-S6). 

For each nucleotide alignment, bootstrap and maximum 
likelihood (ML) searches for optimal trees were simul- 
taneously conducted over 5,000 replicates using RAxML 
7.2.8 with the GTRGAMMA model [50,51] through the 
CIPRES webserver [52]. As implemented through CIPRES, 
RAxML has two substitution models: GTRGAMMA 
and GTRCAT. GTRGAMMA is considered more thor- 
ough [50-52]. Accession numbers given in Additional 
file 2: Tables S2-S4. 



Selection analyses 

Estimates of the number of nonsynonymous substitutions 
per nonsynonymous sites (dN) and synonymous substi- 
tutions per synonymous sites (dS) were produced for 
three AcSpl nucleotide alignments using MEGA5 [53]: 
N-terminal encoding variants (Figure 4B; Additional file 1: 
Figure S4), iterated repeats (Figure 3; Additional file 1: 
Figure S3), and C-terminal encoding variants (Figure 4C; 
Additional file 1: Figure S5). CodonTest [54] implemented 
through the Datamonkey webserver [55,56] indicated 
the Felsenstein 1981 (F81) [57] model of codon substi- 
tution as the best-fit for all analyzed datasets. dN/dS 
ratios less than, equal to, or greater than 1 were inter- 
preted as purifying selection, neutrality, or positive 
selection, respectively. 

Availability of supporting data 

All sequences generated in this study are deposited in 
GenBank (KJ206570-KJ206620). Alignments used in 
ML analyses are available shown in the additional files. 
Alignments and the corresponding trees for this study 
are available at TreeBASE (http://purl.org/phylo/treebase/ 
phylows/study/TB2:S15355). 

Additional files 



Additional file 1: Figure SI. Nucleotide alignment of the 20 repeat 
units (Aar_R) from the complete A argentata AcSpl repetitive region. 
Alignment position numbers shown in increments often. Frame 1 
translation is shown under nucleotide sequences. Alignment prepared with 
Geneious v6.1.6 (Biomatters Ltd, Auckland, NZ). Figure S2. Nucleotide 
alignment of 30 concatenated AcSpl N- and C- terminal coding regions. 
Alignment position numbers shown in increments often. Frame 1 translation 
is shown under nucleotide sequences. Alignment positions 1-510 encompass 
the N-terminal coding region, 51 1-840 the C-terminal coding region. 
Alignment prepared with Geneious v6.1.6 (Biomatters Ltd, Auckland, NZ) 
and available on TreeBASE (http://purl.org/phylo/treebase/phylows/study/ 
TB2515355). Abbreviatons and accession numbers in Additional file 2: 
Table S2. Figure S3. Nucleotide alignment of Araneidae AcSpl iterated 
repeats. Alignment position numbers shown in increments often. Frame 1 
translation is shown under nucleotide sequences. Alignment prepared 
with Geneious v6.1.6 (Biomatters Ltd, Auckland, NZ) and available on 
TreeBASE (http//purl.org/phylo/treebase/phylows/study/TB2:S1 5355). 
Abbreviations and accession numbers in Additional file 2: Table S3. 
Figure S4. Nucleotide alignment of AcSpl N-terminal coding variants 
from PCR survey of individual Argiope genomes. Nucleotide alignment 
in FASTA format. Alignment available on TreeBASE (http://purl.org/ 
phylo/treebase/phylows/study/TB2:Sl 5355). Abbreviations: A argentata 
(Aarg), A. aurantia (Aau), and A. trifasciata (At). Figure S5. Nucleotide 
alignment of AcSpl C-terminal coding variants from PCR survey of 
individual Argiope genomes. Nucleotide alignment in FASTA format. 
Alignment available on TreeBASE (http://purl.org/phylo/treebase/ 
phylows/study/TB2:Sl 5355). Abbreviations: A. argentata (Aarg), A. aurantia 
(Aau), A. trifasciata (At), and Araneus ventricous (Av). Figure S6. Nucleotide 
alignment of AcSpl repeat region from PCR survey of individual Argiope 
genomes. Nucleotide alignment in FASTA format. Alignment available on 
TreeBASE (http://purl.Org/phylo/treebase/phylows/study/TB2:S15355). 
Abbreviations: A. argentata (Aarg), A. aurantia (Aau), A trifasciata (At), 
and Araneus ventricosus (Av). 

Additional file 2: Table SI. Primers used for full-length Aar_AcSpl 
sequencing and targeted amplification of N-terminal, repetitive, and 
C-terminal coding regions. The name and sequence of primers designed for 
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primer walking during BAC clone sequencing and primers designed to amplify 
N-terminal, repetitive, and C-terminal regions of Aar_AcSpl are given. Table S2. 
Accession numbers of spidroin sequences used for maximum likelihood 
analysis of terminal regions (Figure 2). Spidroin name is the name used in this 
manuscript. Species is the spider species that corresponds to the N- or 
C-terminal accession number. If a full length gene was used, only one 
accession number appears. Table S3. Accession numbers for sequences used 
in maximum likelihood analyses of iterated repeats (Figure 3). Spidroin name 
is the name used in this manuscript. Species is the spider species that 
corresponds to the N- or C-terminal accession number. Table S4. 
Accession numbers for AcSpl sequences generated in this study and 
used in maximum likelihood analyses of repeat region and N- and 
C-terminal encoding variants (Figure 4). GenBank abbreviations, species, 
and accession number are given. Table S5. Predicted amino acid 
composition and codon usage of the coding region of Aar_AcSpl. The 
percentage Aar_AcSpl composed of each amino acid and percentage 
of each codon used for each amino acid. Table S6. Overall and 
putative helical region pairwise identities of A trifasciata consensus 
AcSpl repeat aligned to consensus AcSpl repeat subparts of L, hesperus 
and U. diversus. Consensus repeat sequences from each subpart of 
AcSpl repeats from L hesperus and U. diversus were aligned to a 
consensus repeat from A trifasciata. Overall pairwise percent identity and 
percent identity shown for each of six helical regions as predicted by 
Xu et al [32]. 
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